Two - Phase Subsampling Designs for Genomic Resequencing Studies
نویسندگان
چکیده
Targeted resequencing of DNA at specific genes or other genomic loci is now feasible for hundreds or thousands of samples, and costs for larger-scale resequencing are decreasing rapidly. For at least the 1 2 T. LUMLEY ET AL. next few years, resequencing will need to be confined to small subsets of the large samples on which genome-wide association studies have been recently been performed. This paper describes some strategies for subsampling an existing cohort for resequencing, and flexibly analysing the resulting data. We illustrate these strategies by describing the actual design and planned analyses for the example that motivated our research, the CHARGES resequencing study carried out by the CHARGE (Cohorts in Heart and Aging Research in Ge-nomic Epidemiology) Consortium. 1. Introduction. The past few years have seen an explosion in the availability of genotype information. Genome-wide association studies have been performed with hundreds of thousands of genetic markers on sample sizes from tens to hundreds of thousands of people, and these have found many, mostly weak, associations between genetic variants and biological or clinical variables. The greatest likely benefit from these studies is in improving the understanding of biological processes in health and disease, rather than in direct prediction, but biological understanding is hampered by the fact that the association studies typically find a set of nearby genetic markers rather than the genetic variant or subset of variants that truly affects biological processes. One approach to finding functional genetic variants near a set of markers is to determine the complete DNA sequence of a region of the genome. Analysis of complete sequence data should sharpen the association estimates compared with analysis only of marker data. More importantly, as statistical association is unlikely to be sufficient to narrow the association down to a single variant, having sequence data facilitates biological investigation of candidate variants, whether in silico, in vitro, or in vivo. DNA resequencing is technologically feasible but very expensive at sample sizes sufficiently large for association studies. Although the costs are decreasing rapidly, over at least the next few years it will be necessary to resequence relatively small subsam-ples from the large samples that have been participated in genome-wide association studies. Two large resequencing projects in cardiovascular disease were funded by the US National Institutes of Health under the American Recovery and Reinvestment Act. In this paper we present the design and analysis principles for one of these studies, …
منابع مشابه
Next-generation sequencing and the expanding domain of phylogeography
Phylogeography is experiencing a revolution brought on by next-generation sequencing methods. A historical survey of the phylogeographic literature suggests that phylogeography typically incorporates new questions, expanding on its classical domain, when new technologies offer novel or increased numbers of molecular markers. A variety of methods for subsampling genomic variation, including rest...
متن کاملOverlapping Pools for High Throughput Targeted Resequencing
Resequencing genomic DNA from pools of individuals is an effective strategy to detect new variants in targeted regions and compare them between cases and controls. There are numerous ways to assign individuals to the pools on which they are to be sequenced. The naïve, disjoint pooling scheme (many individuals to one pool) in predominant use today offers insight into allele frequencies, but does...
متن کاملReturn of results in the genomic medicine projects of the eMERGE network
The electronic Medical Records and Genomics (eMERGE) (Phase I) network was established in 2007 to further genomic discovery using biorepositories linked to the electronic health record (EHR). In Phase II, which began in 2011, genomic discovery efforts continue and in addition the network is investigating best practices for implementing genomic medicine, in particular, the return of genomic resu...
متن کاملQueue with Heterogeneous Server Under Resequencing Constraint
In this paper, a markovian queue with two types of servers and buffer space is considered. The resequencing constraint is imposed according to which the customers leave the system in the same order in which they entered it. For finite buffer queue, the steady state queue size distribution has been obtained.?
متن کاملApplications of DNA tiling arrays for whole-genome analysis.
DNA microarrays are a well-established technology for measuring gene expression levels. Microarrays designed for this purpose use relatively few probes for each gene and are biased toward known and predicted gene structures. Recently, high-density oligonucleotide-based whole-genome microarrays have emerged as a preferred platform for genomic analysis beyond simple gene expression profiling. Pot...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012